Principal balances of compositional data for regression and classification using partial least squares

نویسندگان

چکیده

Abstract High‐dimensional compositional data are commonplace in the modern omics sciences, among others. Analysis of requires proper choice a log‐ratio coordinate representation, since their relative nature is not compatible with direct use standard statistical methods. Principal balances, particular class orthonormal coordinates, well suited to this context as they constructed so that first few coordinates capture most variability set. Focusing on regression and classification problems high dimensions, we propose novel partial least squares (PLS) procedure construct principal balances maximize explained response variable notably ease interpretability when compared ordinary PLS formulation. The proposed balance approach can be understood generalized version common log‐contrast models since, instead just one, multiple log‐contrasts estimated simultaneously. We demonstrate performance method using both simulated empirical sets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classification using partial least squares with penalized logistic regression

MOTIVATION One important aspect of data-mining of microarray data is to discover the molecular variation among cancers. In microarray studies, the number n of samples is relatively small compared to the number p of genes per sample (usually in thousands). It is known that standard statistical methods in classification are efficient (i.e. in the present case, yield successful classifiers) partic...

متن کامل

Comparison of Partial Least Squares and Principal Components Regression of Chemometric Data

The data for this analysis, from Umetrics (1995), come from the field of drug discovery. New drugs are developed from chemicals that are biologically active. Testing a compound for biological activity is an expensive procedure, so it is useful to be able to predict biological activity from cheaper chemical measurements. In fact, computational chemistry makes it possible to calculate certain che...

متن کامل

Partial least squares methods: partial least squares correlation and partial least square regression.

Partial least square (PLS) methods (also sometimes called projection to latent structures) relate the information present in two data tables that collect measurements on the same set of observations. PLS methods proceed by deriving latent variables which are (optimal) linear combinations of the variables of a data table. When the goal is to find the shared information between two tables, the ap...

متن کامل

Partial Least Squares Regression (PLS)

Number of latents The same number of factors will be extracted for PLS responses as for PLS factors. The researcher must specify how many latents to extract (in SPSS the default is 5). There is no one criterion for deciding how many latents to employ. Common alternatives are: 1. Cross-validating the model with increasing numbers of factors, then choosing the number with minimum prediction error...

متن کامل

Partial Least Squares (PLS) Regression

Pls regression is a recent technique that generalizes and combines features from principal component analysis and multiple regression. It is particularly useful when we need to predict a set of dependent variables from a (very) large set of independent variables (i.e., predictors). It originated in the social sciences (specifically economy, Herman Wold 1966) but became popular first in chemomet...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Chemometrics

سال: 2023

ISSN: ['1099-128X', '0886-9383']

DOI: https://doi.org/10.1002/cem.3518